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Abstract 

Singular value decomposition is the key tool in the analysis and understanding of lin- 
ear regularization methods in Hilbert spaces. Besides simplifying computations it allows to 
provide a good understanding of properties of the forward problem compared to the prior in- 
formation introduced by the regularization methods. In the last decade nonlinear variational 
approaches such as I 1 or total variation regularizations became quite prominent regulariza- 
tion techniques with certain properties being superior to standard methods. In the analysis 
of those, singular values and vectors did not play any role so far, for the obvious reason that 
these problems are nonlinear, together with the issue of defining singular values and singular 
vectors in the first place. 

In this paper however we want to start a study of singular values and vectors for nonlinear 
variational regularization of linear inverse problems, with particular focus on singular one- 
homogeneous regularization functionals. A major role is played by the smallest singular 
value, which we define as the ground state of an appropriate functional combining the (semi- 
)norm introduced by the forward operator and the regularization functional. The optimality 
condition for the ground state further yields a natural generalization to higher singular values 
and vectors involving the subdifferential of the regularization functional, although we shall 
see that the Rayleigh principle may fail for higher singular values. 

Using those definitions of singular values and vectors, we shall carry over two main prop- 
erties from the world of linear regularization. The first one is gaining information about scale, 
respectively the behavior of regularization techniques at different scales. This also leads to 
novel estimates at different scales, generalizing the estimates for the coefficients in the linear 
singular value expansion. The second one is to provide classes of exact solutions for variational 
regularization methods. We will show that all singular vectors can be reconstructed up to a 
scalar factor by the standard Tikhonov-type regularization approach even in the presence of 
(small) noise. Moreover, we will show that they can even be reconstructed without any bias 
by the recently popularized inverse scale space method. 

Key words: Inverse Problems, Variational Regularization, Singular Values, Ground 
States, Total Variation Regularization, Bregman Distance, Inverse Scale Space Method, Com- 
pressed Sensing. 

1 Introduction 

Regularization methods and their analysis are a major topic in inverse problems and image pro- 
cessing. In the last century mainly linear regularization methods for problems in Hilbert spaces 
have been studied and analyzed, and it seems that for such methods a quite complete theory is 
now available based on singular value decomposition (cf. |64j ) of the forward operator in the norm 
defined by the regularization (cf. |36|). respectively generalizations to spectral decompositions in 
the rare cases of non-compact forward operators (cf. [36]). 
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The research in the 21st century has significantly shifted from linear regularizations to nonlinear 
approaches, in particular variational methods generalizing Tikhonov regularization, where singular 
regularization functionals such as ^-norms or the total variation are used. In many examples 
the above mentioned functionals have shown to yield improved properties with respect to the 
incorporation of prior knowledge and the quality of reconstructions, and have also been a key 
tool in the adjacent theory of compressed sensing (cf. [25] [26l ESI E2]). Various advances in the 
analysis of such regularization methods have been made over the last years, ranging from basic 
regularization properties (cf. e.g. [TJ [23]) over error estimation (cf. |2121 125] 155 ] 157 ] PI 135 ] 135 ] ) 
to corrections of inherent bias by iterative and time-flow techniques (cf. [501 GH1 [T5J [TBI ITT]). 
Singular values and vectors did so far not play any role in the analysis of such methods and it is 
common belief that their use is restricted to linear regularization methods. This is not surprising, 
since first of all it is not trivial to define a notion of singular values and to characterize it in 
the nonlinear case. Moreover, it is obvious that due to missing linearity no decomposition into 
singular values can be achieved. For these reasons the study of singular values (or eigenvalues of 
regularization functionals) has been mainly abandoned in the inverse problems community, studies 
of related nonlinear eigenvalue problems rather exist in nonlinear partial differential equations and 
functional inequalities (cf. [2 [El HSl |3H EH SH SSI SSI S3 EH! ) , in control theory (cf. [39]), in 
image processing (cf. HJ [S] [pj [57J ) and surprisingly in machine learning (cf. [T7MSS]). Motivated 
by those as well as general approaches to nonlinear eigenvalue problems we will define a ground 
state by a Rayleigh-type principle and further singular values and singular vectors by considering 
the first-order optimality condition for the non-convex variational problem defining ground states. 
The main results we shall derive are the following: 

• First of all, our definition of singular values and singular vectors is studied and demonstrated 
to be a meaningful extension of the linear case, although some properties can be lost in 
extreme cases, e.g. the discreteness of the spectrum and the Rayleigh principle for higher 
singular values (Section 3). 

• With the singular values and singular vectors we derive error estimates for appropriate linear 
functionals of the solution, which provide information about the behavior at different scales. 
This is made explicit for an example in total variation denoising (Section 4). 

• An important part is to verify that singular vectors are exact solutions of variational regular- 
ization schemes. This means that if the image of a singular vector under the forward operator 
is used for the reconstruction, the solution is a multiple of the singular vector. Surprisingly, 
under certain conditions particularly met for singular regularizations, the same holds true if 
a certain amount of noise is added. For inverse scale space methods we can further verify 
that singular vectors are reconstructed without bias, i.e. after finite time (depending on 
the singular value) the solution of the inverse scale space equals exactly the singular vector, 
without a multiplicative change (Sections 5 and 7). 

• We derive estimates on the bias of variational regularization schemes, which show that the 
minimal bias is somehow defined by the ground state, respectively by the smallest singular 
value in our definition (Section 6). 

• We provide a variety of examples of inverse problems and regularization functionals, for 
which singular values and singular vectors can be computed explicitly. This allows to draw 
various conclusions about the behavior of the regularization and the typical shape of preferred 
solutions (Section 8). 

2 Notations and Assumptions 

To fix notation, we consider linear inverse problems of the form 

Ku = f, (2.1) 
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with K : hi — > H being a linear operator mapping from a Banach space hi to a Hilbert space T-L, 



with the goal to recover u from (2.1) with given data / that is potentially being corrupted by 
noise. We are mainly interested in the case of K being compact, in particular continuous from 
the weak or weak-* topology of hi to the strong topology of H, which creates ill-posedness of the 
inverse problems. 

Nonlinear variational regularization methods for computing robust approximate solutions of 



(2.1 ) are of the form 



ue argmin ( i || Ku - ff H + aJ(u)\ , (2.2) 

uGdom(J) I *> ) 

with J : dom( J) Cil->lU {+°°} being a so-called regularization functional that incorporates 
the a-priori knowledge, and a G R>o denoting the regularization parameter that controls the 



impact of J on the solution u of (2.2). Note that in the variational approach linear regularizations 



methods are related to quadratic regularization functionals like 

J(v) = l\\Du\\u 

for linear operators D : hi — > hi, since they lead to linear optimality conditions. Another classical 
choice motivated from statistical mechanics and information theory is the Boltzmann (Shannon) 
entropy regularization functional, in hi = L 1 (fi), 

J (it) = / ulog(u) — u dx, 
Jn 

leading to the so-called maximum entropy regularization (cf. |34j ) . Recently popular functionals 
are non-differentiable regularization energies like the one-norm J(u) — \\u\\ e i in U = R N or the 
total variation J (it) = TV(it), being defined as 



TV(it) := sup / u divtp dx . (2-3) 

Total variation regularization became popular in the Rudin-Osher-Fatemi (ROF) model [SS] 

uGaxginin{i||u-/||^ (n) + aTV(u)j . (2.4) 



uGBV(n) 



The space BV(fi) is the space of all function u E L 1 (fi) such that TV (it) is bounded. In case of 
ft C W 1 for n € {1, 2} the space BV(fi) can be embedded into L 2 (tt). 



Since we are going to deal with rather large classes of convex functionals J, let us recall some 
basic facts from convex analysis (cf. [58], [35] for detailed discussions). As usual for a Banach space 
hi, the Banach space of bounded linear mappings from hi to M is called the dual space of hi and is 
denoted by hi* , with norm 

\\p\\u* ~ sup \p(u)\ = sup ^tjt- = sup \p(u)\. 
IMIu = l u£U\{0} \\ u \\u IMI«<i 

The functional p(u) — (p, u) w x u is called the dual product. Throughout this work we are going 
to denote the dual product simply by (p,u) u . In case that hi is even a Hilbert space, the dual 
product can be identified with the scalar product of hi. 

The characterization of dual spaces and its elements allows us to define the subdifferential of 
a convex functional. 
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Definition 1 (Subdifferential). Let U be a Banach space with dual space U* , and let the proper 
functional J : U — > IU {+00} be convex. Then, J is called subdifferentiable at u e 14, if there 
exists an element p eW* such that 

J(v) - J(u) - (p, v - u) u > 

holds, for all v e 14. Furthermore, we call p a subgradient at position u. The collection of all 
subgradients at position u, i. e. 

dJ(u) := {peU* I J(v) - J(u) -{p,v- u) u > , Vw e 14} C 14* , 

is called subdifferential of J at u. 

We further mention that the subdifferential of one-homogeneous functionals can be further 
characterized as 

dJ(u) := {peU* \(p,u) = J{u), (p,v)< J{v) ,\fv eU} CU* . (2.5) 

Another concept we shall use in several arguments is the notion of (generalized) Bregman 
distances, defined as 

D p j(v,u) = J(v)- J(u) -{p,v- u) u , 

with p e dJ(u). Bregman distances are not common distance functionals, since they do not satisfy 
a triangle inequality and are not symmetric in general. However, for J being convex they are non- 
negative and satisfy Dj(u,u) = 0. Symmetry can be restored by using symmetric Bregman 
distances, i.e. 

D p /' symm (v, u) = D p j(u, v) + D q j(v, u) = (q-p,v- u) u 

for q G dJ(v). 

Before we continue with the definition of ground states and singular vectors for general convex 
and subdifferentiable regularization functionals, we want to precisely define the class of operators 
and functionals we are going to investigate. Thus, for the remainder of this work we will assume 
the following properties without further notice: 

Assumption 1 (Setup). 

• f2 C M. d , X C R k are bounded domains. 

• U is a Banach space, being the dual of some other Banach space. 

• % is a Hilbert space. 

• K :U — >• % is a bounded linear operator mapping between these spaces 

• J : dom( J) CW^RU {+°°} * s 0, proper non-negative convex functional 

3 Ground States and Singular Vectors 

In this section we want to define ground states of regularization functionals as well as an analogue 
of singular vectors for nonlinear functionals. 
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3.1 Ground States 

We start with a definition of a ground state, which is motivated by similar properties in partial 
differential equations, e.g. the classical one of a ground-state in the Schrodinger equation and 
related problems (cf. [12, 68, HJ). In order to obtain a ground state we normalize the element u 
and minimize the regularization functional among those elements, i.e. uq is defined as 

i*o € argmin {J(u)} . (3-1) 

uf!dom( J) 
\\Ku\\-h = 1 



In the context of variational schemes like ( 2.2 ) we are particularly interested in non-trivial ground 
states of one-homogeneous regularization functional. A trivial ground state appears if J(uq) = 0, 
and we can immediately provide a well-known example for such: 

Example 1. Let Slcl' 1 with d E {1,2}. Then we know that BV(fi) C L 2 (0) holds. Thus, for 
K = I being the identity operator / : L 2 (Vl) — > L 2 (f2), a trivial ground state of J = TV is the 
constant function uq = l/yjf2|, since we have TV(uo) = and ||mo||l 2 (o) = 1- Note that as usual 
the ground state is not unique, since —Uq is a ground state as well. 

However, in many cases trivial ground states do not give interesting insights into the nature 
of a regularization energy, as the previous example shows. Thus, we would like to investigate 
non-trivial ground states that are orthogonal to the trivial ones in a reasonable sense. Let us 
therefore define some preliminary notions first. The -fT-product of two elements u, v € U is defined 
as 



(u, v)k ■— (Ku, Kv)u . 

Furthermore we are going to write \\u\\k as an abbreviation for U (u, u)k- This particular defi- 
nition of a scalar product for elements of a Banach space allows us to define a useful orthogonal 
complement of a kernel of a regularization functional, which we define as usual via 

ker(J)- L := {u e dom(J) | (u,v) K = 0,Vw G ker(J)} . (3.2) 

For completeness we also introduce 

kcr( J) := {u € dom(J) | J(u) = 0} . 

In the case of convex non-negative one-homogeneous functionals we are mainly interested in, 
the kernel and also its complement can further be characterized as linear subspaces: 

Lemma 1. Let J be convex, non-negative and one-homogeneous. Then ker( J) is a linear subspace. 

Proof. Let u, v e kcr(J) and a, b e K such that \a\ + \b\ ^ 0. With a = , °L, e [0, 1], u = sign(a)u 
and v = sign(£>)u we obtain 

< J(au + bv) = J((\a\ + \b\){au + (1 - a)v)) 
= (|o| + |6|) J(au + (1 - a)v) < (\a\ + \b\)(aJ(u) + (1 - a)J(v)), 

by using the one-homogeneity and convexity of J. Since J(u) — J(u) — and J(v) — J(v) = 
hold due to the one-homogeneity of J, we conclude J(au + bv) = as well. Thus, au + bv € ker(J) 
holds true. □ 

Example 2. Considering J = TV again, we easily see that ker(TV) equals the set of all constant 
functions, because the estimation of the kernel can simply be reduced to estimating ker(V). 
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Figure 1: The function u a as defined in ( |3.5[ ), for a = 1/2. This function is a ground state of 
K = I, J = TV, according to Definition [2j 



Definition 2 (Ground State). Under the above assumptions on J and K, a ground state u is 
defined as an element 

u a € argmin {J(u)} . (3.3) 

uGker(J) i 
l|if«llw=l 



Moreover, if uq exists we call 



the smallest singular value. 



J(u ) (3.4) 



Under standard assumptions on variational regularization methods, the ground state indeed 
exists, which can be verified by usual arguments: 

Theorem 1. Let d be a metric on 11, let K be continuous from this metric topology to the strong 
topology ofH, and let J be lower semicontinuous with respect to this metric topology. Moreover, 
let the sublevel sets of u i— > HifuH^ + J{u) be compact in the metric topology. Then there exists at 
least one ground state Uo eM. 

We want to give a brief example of ground states in case of J(u) = TV(u). 
Example 3. For f2 = [0, 1] C R, K = I and J(u) — TV(w) we want to consider 

r x>a 

— x < a 

\ a 

for a € (0,1). We easily see that u a is orthogonal to the kernel of TV, which consists of all 
constant functions due to Example [ij since u a (x) dx — holds. Moreover, u a guarantees the 
normalization constraint ||M a ||i2([o,i]) = 1 for every a €]0, 1[. However, the TV-value TV(it a ) = 
, 1 is a strictly convex function in a with unique minimum at a = 1/2. Thus, u = ui , which 

is visualized in Figure [T] is a ground state if we can prove that there does not exist a function u 
with (u, 1) L 2( [0)1] ) = 0, ||w|| L 2([ 0a] ) = 1 and TV(u) < 2. 

Lemma 1. There exists no function u € C([0, 1]) with ||u||z,2([o,i]) = 1 and (1, {t)/,2Qo,i]) = such 
that TV{u) < 2 holds. 



Proof. It is easy to see that for the monotonic rearrangement of an arbitrary function u £ C([0, 1]), 
which we want to denote by u* , we have (cf. e.g [2]) 

TV(u*) < TV(u). 

Thus, in the following we are going to consider monotonically increasing functions it* € C([0, 1]) 
with ||w*||i,2([o,i]) = 1 and (1,u*)l 2 ([o.i]) = only, without loss of generality. Now we want to 
prove TV(w*) > 2 by contradiction and therefore subdivide the proof into two parts. First of all 
we are going to prove the inequalities 

(u*(l)) 2 -l>((u*(l)) 2 -(u*(0)f)y (3.6) 



and 



l<TV(fi*)(- C u*{x)\ , (3.7) 



with y denoting a root of u*. Subsequently we are going to use (3.6) and (3.7) to conclude the 
contradiction. 

For a monotonically increasing function u* we can rewrite the normalization constraint ||"&* ||i a ([o,il) 
1 to 

1 = ll«*lli*([o,i]) = / (u*(x)) 2 dx+ [ {u*{x)f dx 

JO Jy 

<y(u*(0) f + (l-y) (tT(l)) 2 , 



with y denoting a root of u*. Rearranging immediately yields (3.6). 

Moreover, according to the second mean value theorem of integration there exists a £ s]0, 1[ 
such that we can rewrite the normalization constraint to 

1 = / u*(x)u*(x) dx 
Jo 

= u*(0) u*(x) dx + u*(l) J u*(x)dx 
= (u*(l) -u*(0)) / u*{x) dx, 



=TV(6') 

where the last equality holds due to (1, u*)l 2 ([o,i]) = 0. Since the value of u* (x) dx gets maximal 



for £ = y and since we know J y u*(x) dx — — u*(x) dx we obtain ( |3.7[ ). 

Finally, we are now able to proof the lemma's statement via contradiction. We assume u* to 
satisfy TV(u*) < 2. Then, the normalization constraint ||l 2 ([o.i]) = 1 however implies either 
u*(l) > 1 or u*(0) < —1. Without loss of generality we assume u*(l) > 1. Thus, there exists 
a constant c > such that u*(l) = 1 + c holds. Due to TV(m*) < 2 this automatically implies 
u*(0) > -1 + c and (u*(0)) 2 < (-1 + c) 2 . Applying therefore yields 

(l + c) 2 -l> ((l + c) 2 -(l- C ) 2 )2/, 



which can be rewritten to y < c/4 + |. We are therefore able to estimate (3.7) to obtain 



1 < TV(u*) (- £ 



u*(x) dx 



< TV(u*)y (-u*(0)) < TY(u*)y{l - c) 
<TV(fi*)(| + (1-c) 

<TV(u*)Q-^-|) <\tV{u*), 
which yields 2 < TV(u*) and therefore is a contradiction to the assumption TV(u*) < 2. □ 
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Note that uq of the previous example is not a unique ground state, since —uq yields the 
same minimum. However, this is also true for ground states of quadratic variational schemes and 
therefore no surprise. The following example shows that ground states are not unique in general. 

Example 4. Let U = £ X (R N ) and U = £ 2 (R M ). Consider the regularization energy J(u) = \\u\\ e i, 
and as an operator a matrix K with columns being normalized with respect to the ^ 2 -norm, i.e. 
(Kej) T ■ (Kej) = 1, with ej denoting the j-th unit vector. Then every vector = (5*i)j=i .. jV) 
with Sij denoting the Kronecker delta, is a ground state. 

In some cases it is useful to have an alternative definition of ground states of variational 



regularizations like (2.2) 



Definition 3 (Ground State II). A ground state uq is defined as 



If uq exists, we shall call 



the smallest singular value. 



u £ argmax {\\Ku\\ n } . (3.8) 

ue'ker(J) 1 - 
J(u)<l 



A = * | (3.9) 



At first glance, Definition [3j appears to be very different from Definition [2j however, for one- 
homogeneous functionals J both definitions are equivalent up to normalization of the singular 
vector and yield the same singular value, as we will see with the following Lemma. 

Lemma 2. Let J be a proper and one-homogeneous functional. Then, Definition^ and Definition 
[3| are equivalent up to multiplication of u Q by Xq . 

Proof. Let u be a ground state that satisfies Definition [2j Then, for u := uq/J(u ) — u /X 
we obtain J(u) = 1 and = l/J(«o) = 1/Ao- Thus, in order to satisfy Definition |3j u is 

supposed to maximize i.e. ||ifu||ft > for all v with J(v) < 1. We prove this 

statement by contradiction and assume that there exists a function v with ||ifi;||^ > 1/Aq and 
J(v) < 1. However, if such a function exists, we can define v := v/\\Kv\\u to obtain a function 
that satisfies = 1 and J(v) < < Aq, which is a contradiction to Uq being a 

ground state in the sense of Definition [2j 

<=: Now let uq be a ground state in terms of Definition[3j i.e. J(uq) < 1 such that ||Xuo||« = : 1/Ao 
is maximized. Then we can define u — XqUq, which satisfies ||.KTu||-h — 1 and J(u) — Xq. In analogy 
to the first part of the proof, we prove by contradiction that u already has to be a ground state 
in terms of Definition [2] We therefore assume that there exists a function v such that J(v) < X 
and 1 1 iff; = 1 holds. If such a function exists, than v := v/J(v) exists as well. However, for v 
we observe J(v) = 1 and ||if«||ft > 1/Ao, which is a contradiction to u being a ground state in 
terms of Definition [31 □ 



3.2 Singular Vectors 

In analogy to singular vectors of linear operators we want to extend the concept of singular vectors 



to variational frameworks of the form (2.2). The motivation is to consider the formal optimality 
condition for the ground state, which is obtained by considering stationary points of the Lagrange 
functional (with parameter A € K) 

L(u;X) = J(u)- ± (||if U ||ft -1), (3.10) 

which is given by 

XK*K€dJ(u), 

where dJ denotes the subdifferential. 
Hence, we define a singular vector as follows: 
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Definition 4 (Singular Vector). Let J be convex with non-empty subdifferential dJ at every 
u £ dom(J). Then, every function u\ =^ with \\Kux\\u — 1 satisfying 

XK* Ku x e dJ (u x ) (3.11) 

is called singular vector of J with corresponding singular value X. The subgradient px — XK* Kux 
is called dual singular vector in the following. 

We mention that taking a dual product with ux yields the singular value relation 



In the case of J being one-homogeneous we even have 

A = J(u) , 

which also implies A > Ao, for any singular value A. 

For smooth J one can prove that the ground state is a singular vector by analyzing the Lagrange 
functional above (cf. [H]). In the one-homogeneous case we give an alternative proof: 

Proposition 1. Let J be one-homogeneous and let u be the ground state with Xq = J(u ). Then 
Ao is a singular value and uq is a singular vector. 

Proof. For J being one-homogeneous, po € 3J(uq) is equivalent to (po,uq)u — J(uo) and 

(Po,v)u < J(u), V ueU , 



due to (2.5 1. We verify this property for pa = XoK*Kuq. First of all we obtain 

(po,u )u = X (Kuq,Ku q )-h = J(u ) 

by the definition of Ao and the normalization of u Q . Moreover, for arbitrary u € U with Ku ^ 
we define v = u/\\Ku\[u and find 

(po,u)u = \\Ku\\ n Xo(Ku , Kv)-h < ||J£Tu||kAo. 
Since v is normalized, we have by the definition of the ground state 

Ao = J(uo) < J(v) = 

\\ Ku \\H 

and thus, (jpo, v)u < J(u). If Ku — 0, then 

(po,u)u = \q{Kuq,Ku)-h = < J(u), 
thus po & dJ(u ). □ 

Higher singular values and singular vectors are difficult to characterize as we shall see from 
examples below. Also orthogonality of singular vectors corresponding to different singular values 
is lost. Consider XK* Kux = px and /iK* Ku^ — p^ for singular vectors ux and u^, then we only 
have ^ ^ 

Y<PA,w M ) = -<Pm' u ^) ■ 

A 

We want to give two examples: 
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Figure 2: The functions u a plotted for the values a £ {1/8,1/4,3/8,1/2,5/8,3/4,7/8}. All of 
these functions are singular values according to Definition |4j but only u 1 / 2 is a ground state. 

Example 5. Let us investigate the functional J(u) = IllVull^^.jjn), in order to demonstrate 
that the definition of singular vectors is consistent with the definition of singular vectors of linear 
operators. Since J is Frechet-differentiable, its subdifferential consists of its Frechet-derivative only, 



i.e. dJ(u) — {— Au}. Considering K = I for simplicity, (3.11 1 reads as the classical Eigenfunction 
problem of the Laplace operator, i.e. 

~\u x = Au x , 

or equivalently as the singular vector decomposition problem of the gradient operator V. Due to 
the compactness of the inverse Laplacian, the spectrum is discrete, i.e. there are countably many 
different singular values. 

The structure of the spectrum, by which we formally denote the set of singular values, changes 
if we consider more degenerate cases as the total variation frequently used in inverse problems and 
imaging: 

Example 6. Let us now consider K being the embedding operator from BV(Q) to L 2 (f2) on the 



unit interval fl = [0, 1]. For a <E (0, 1) the function u a defined by (3.5) is a singular vector of TV, 
with singular value A = 1/a/(1 — a)a, as we shall see in the following: We can characterize the 
subdifferential of TV as 

dTV{u) = |div<^ IMIloc^.r,,) < 1, <p-n\ m = 0,{divip,u) L 2 (Q) = TV(u) } , (3.12) 

for u 6 BV(f2). We see that the distributional derivative of the continuous function q a : [0, 1] — >• 
[—1,0] defined as 



q a (x) 



x > a 
x < a 



for a G]0,1[, is an element of <9TV(-u a ), since we have ||<7 a ||L°°([o.i]) = 1) 9 a (0) = <Z a (l) = and 
{(q a )' , u a )L2Q 01 ]) = TV(u") = A (here the derivative has to be considered in a distributional way). 
Moreover, the distributional derivative of q a satisfies the singular vector relation Xu a = (q a )' € 
<9TV(u a ), for A = 1/^/(1 — a)a. Hence, u a is a singular vector of TV. The singular vector is 
visualized for various choices of a in Figure [2] Now we see that there exists a continuous spectrum 
[2,oo) although in spatial dimension one the embedding operator is compact. 

We are going to consider various further examples in Section [H] 
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3.3 Failure of the Rayleigh Principle 

Above we have defined the ground state by a Rayleigh principle, i.e. minimizing J with respect to 
normalization and orthogonality to the kernel. Again in the linear case, further singular vectors 
can be obtained by similar Rayleigh principles, e.g. by minimizing J(u) subject to u £ ker(J)- L , 
\\ u \\k = 1 and (u, uq) k = 0. We have already seen from Example[6]that in the nonlinear case we will 
not be able to compute all singular values and singular vectors this way. However, it seems at least 
interesting whether we can obtain an orthonormal basis of singular vectors (with orthogonality 
defined in the i^-scalar product, assuming for the moment that K has trivial nullspace). As we 
shall demonstrate in the following, this is not possible in general. 

Let us first discuss formally why the Rayleigh principle can fail to yield singular vectors in 
the nonlinear case. Assume for this sake we have a system of orthonormal singular vectors Uj, 
j = 0, . . . , n, i.e. 

\ ] K*Ku\ j =p\ 3 € dJ(u X] ), (u\ v u\ k ) K = S jk - 

Then we define 

u\ n+1 G argmin J(u), (3.13) 

with the constraint set 

C n :={ueU | |H]jc = 1, <«Aj,«>JC = 0, j = n }. (3.14) 

Note that the existence of u\ n+1 follows under the same conditions as the existence of a ground 
state if in addition C n is nonempty (which is always the case in infinite dimensions). Again we 
may set up the Lagrange functional 

A " 
L = J(u) - - (\\uf K - 1) -J2N(^,u) K (3.15) 

j=o 

and thus obtain the optimality condition as the variation with respect to u via 

n 

\K*Ku Xn+1 +J2»3 K * Ku *i =PA„ +1 G dJ(u Xn+1 ). (3.16) 

3=0 

Now u\ n+1 is a singular vector if and only if fij = for j = 1, . . . , n. To understand the latter we 
take a product with u\ kl k € {0, . . . , n}, and obtain 

due to the orthonormality. Now there is no particular reason why (p\ n+1 ,u\ k )u — should hold, 
since we cannot use the usual argument as in the linear case, namely a simple eigenvalue equation 
for u Xk ■ 

We finish this section with a simple example that explicitly shows the failure of the Rayleigh 
principle: 

Example 7. Let U = ^(R 2 ) and H = £ 2 (M 2 ), J(u) = \\u\\i, and choose K such that 

K*K= ( I 2( ( ) (3.17) 

for < e < j. Then it is straight-forward to see that the ground state is given by uq — ±(1,0) T 
and hence the only choice for u\ orthogonal to uo is given by u% = ±(0, 1) T . Without restriction 
of generality we can use the one with positive sign and verify that it is not a singular vector. We 
find 

XK*Ku x = A(2e,e) T . 

Now \K*Ku\ £ <9||ui||i if and only if its second entry equals one and the absolute value of the 
first entry is less or equal one. We thus find the conditions Ae = 1 and 2Ae < 1, which cannot be 
satisfied, implying that u\ is not a singular vector. 
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From the last example we see that it is not possible to construct an orthonormal basis of 
singular vectors starting from the ground state in general. However, in certain cases this may still 
be possible, as we shall see also in a surprising example in the next section. 



4 Scales and Scale Estimates 

In the linear singular value decomposition, i.e. XK* Ku = y in our setup, the singular vectors 
and singular values carry information about scale. The usual definition of scale respectively 
frequency is related to the value of A. In analogy to the eigenvalues of the Laplace operator 
mentioned above, small A means small frequency respectively large scale, and the scale information 
is carried by the singular vectors. For increasing A the frequency is increasing, respectively the 
scale is decreasing. It turns out that the interpretation of scale is even more striking in some 
nonlinear regularization cases, in particular for the ROF denoising model discussed in Example 
[6j Here, intuitively the ground state is the largest scale, since it includes two plateaus of size j. 
For other values of a we have A = a ^_ a ^ , i-e. decreasing scale the farther a is away from | . This 
is quite intuitive as well, since the singular value has plateaus of length a or 1 — a and the size 
of the smaller plateau is decreasing in scale as A is increasing. As we shall see in the following, 
we can use an appropriate selection of singular vectors of the ROF problem to obtain a standard 
multiscale representation. 



4.1 Scales in Total Variation and the Haar Wavelet 

The probably most frequently studied and best understood multiscale decomposition is the one of 
signals in the Haar wavelet basis, given on the unit interval by ipo,o(x) — x( x ) an d 

^, k (x) = 2^-^/ 2 ( X (^x - 2k) - X (2 j x - 2k + 1)), k = 0, . . . , 2^ - 1, j = 1,2, . . . (4.1) 

with the scale function \ being the characteristic function of the unit interval, i.e. 

ir [M ■ <«> 

It is a kind of folklore that the Haar wavelet decomposition is closely related to total variation 
methods in spatial dimension one (respectively to anisotropic total variation in higher dimension), 
and rough connections between ROF denoising and filtering with Haar wavelets have been estab- 
lished (cf. [24j |44j |63] ) . Here we shall provide a more explicit connection between the Haar wavelet 
basis and singular vectors of the ROF functional. For this sake we use a slightly nonstandard def- 
inition of the total variation in the form 



TV* (it) := sup / u Arvip dx . (4-3) 

y>eC°°(f2;R™) Jn 

IMIl,°°(f2;P™)<l 



Note that in contrast to (2.3) we do not choose test functions tp with compact support, which 
yields an additional boundary term. For functions u £ VF 1,1 ([0, 1]) nC([0, 1]) it is straight-forward 
to see that ^ 

TV*(u) = / \u'(x)\ das+|u(l)| + |«(0)|. (4.4) 
Jo 

This in particular eliminates the kernel of the total variation, so that the ground state is indeed a 
constant uq = 1 with Ao = TV*(uo) = 2. Then our main result is the following: 

Theorem 2. Let K : BV([0,1}) L 2 ([0, 1]) be the embedding operator and let J = TV*. Then 
the Haar wavelet basis is an orthonormal set of singular vectors for K and J , i.e. 

Xj,hi>j,k e dTV.tyt) (4.5) 
with singular value Xj t k — 2^ +3 -'/ 2 for j > 1 and Xq = 2. In particular uq is a ground state. 
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Proof. We first show that uq = ipg^ is a ground state. Take an arbitrary continuously differentiable 
function u satisfying the normalization condition u 2 dx = 1. Then there exists xq G [0, 1] such 
that |u(xo)| > 1. Thus, by the triangle inequality 

TV*(«) > |«(1) - u(ar )| + |«(ar ) - u(0)| + |tt(l)| + |«(0)| > 2|«(ar )| > 2 

holds. Since C 1 ([0, 1]) is dense in BV([0, 1]) we conclude that the infimum over TV* over all func- 
tions of bounded variation with normalized L 2 -norm is 2. Since TV*(uo) = 2 and ||tio ||i 2 ([o > i]) = 1> 
it is a ground state. 

We further need to verify that pj ^ = \j kipj,k IS a subgradient of the one-homogeneous func- 
tional TV* at ip j>k , for j > 1. For this sake we first compute TV*(V>j,fc) = 2k'- 1 )/ 2 4 = 2<>'+ 3 )/ a 
and immediately see that 

(pj,k, il>j, k ) = A ilfc / \ipj, k \ 2 dx = 2«+ 3 )/ 2 . 
Jo 

The remaining step is to prove that 

(Pj,k,u) < TV»(u) 

for arbitrary u £ BV([0, 1]). For this sake we consider the primitive q satisfying q' — pj : k and 
q(0) = — 1. We observe that indeed q attains its maxima and minima on the jump set of Pj t k and 
they equal +1 respectively —1. Hence, ||<7||oo = 1 an d we find 

(Pj,k, u ) = / q'(x)u(x) dx < sup u <p' dx. 

Jo ¥»ew 1,2 ([o,i];»)>' n 

ll¥>IU~([o,i])<l 

Finally, by a density argument we conclude that this supremum equals the one over C°°, hence 

(Pj,k,u) < sup / u if' dx — TV*(u). 

ipec°°([04];M) Jn 

l^lli^°([0,l])<l 

□ 



4.2 Scale Estimates 



In the following we want to demonstrate how singular vectors can be used to derive sharp error 
estimates on the components at different scales without any additional prior knowledge on the 
solution, as e.g. the assumption of specific source conditions (cf. [531 S3])- We will make the 
connection to scale explicit by again considering the case of the ROF-model and the corresponding 
singular vectors. 

Let us assume we are given a singular vector u\ with singular value A and dual singular vector 
P\, i-e. 

\K*Ku x = Px edJ(u x ) 



for = 1. Then we can estimate solutions of ( |2.2[ ) for input data given in terms of / = 

Kit + r), with u G dom(K) n dom( J) and 77 G %, with respect to this particular singular vector. 



Theorem 3. For input data f = Ku + r], u G dom(K) and rj the solution u of (2.2 1 satisfies 
the estimate 



1 



-p\,u-U 



< \(r),Ku x ) n \ +a\(p,u x ) u \ . 
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Proof. The estimate is simply a consequence of taking a dual product of the optimality condition 
of (2.2 1 with u\. The optimality condition reads as 



K* K(u — u) + ap = K*rj , 
for p £ dJ(u) . Taking the dual product with u\ yields 

(K*Ku\, u-u) u = (K*r) - ap, u x )u 

which is equivalent to 



\P\,u-u\ = (K*rj - ap,u x )u 
I u 



The estimate for the absolute value of the right-hand side simply follows from the triangular 
inequality. □ 

Remark 1. In case of J being one-homogeneous we have \{p,u\) u \ < J(u\) = X and the estimate 
therefore can be used to deduce 



i . A 

-p\,u- u ) 
A 1 1 



<aX+\(r),Ku x ) n \ 



The two terms in the error estimate can be easily interpreted. The term aX is a worst-case estimate 
for the bias at the scale of the used singular value, while the second term describes the impact of 
noise on a specific scale. Note that the scalar product of the noise with Ku\ implies actually some 
averaging of the noise, which usually reduces the noise variance stronger on larger scales than 
on smaller ones. This can be analyzed in particular for statistical noise models such as additive 
Gaussian noise. 

In the following we want to demonstrate how Theorem [3] can be used to derive estimates for 
the difference of the reconstruction u and the input data u on different scales. 

Example 8. With this first example we want to find a worst-case estimate for a reconstruction 
u of (2.4), with respect to the input data / = u + rj on the scale [0, o] C [0, 1], for < a < 1. We 
want to point out that the ROF model is mean-value preserving, i.e. J (u — u) dx = 0. Let us 
investigate the integral equation 



I 

■Jo 



(u — u) dx = Co 



/ uo(u- u) 
Jo 



dx + ci / u\ (u — u) dx . 



(c uq + ciUx) (u — u) dx , 



(cqUq + c\Ui) (u — u) dx + (cqUq + CiUi) (ii — u) dx , (4-6) 



with the function uq defined as uq(x) = 1, and with u\{x) :— u a (x) being the singular vector of 
Example [6j Then, (4.6) reads as 



(u — u) dx 



Co 



1 - a 



ci 



(u — u) dx + I cq + 



1 -a 



ci 



(u —u) dx. (4.7) 



system of equations 



In order to satisfy (|4.7|), the coefficients Co and c\ have to be chosen such that they solve the linear 

(4.8) 




14 



It is easy to see that cq = a and c\ = —y/a{l — a) solve (4.8 1. Thus, we obtain the following scale 
estimate 



(u — u) dx 



a I u — u 



dx —y/ a(l — a) / ui(u — u) dx 



= \/a(l - a) 
< y/a(l - a) 



u\ (u — u) dx 



i/a(l - a) 



rj ui dx 



+ \J a(l — a) J 77 u\ dx ^ 



since 1/ \/a(l — a) is the singular value of u\. Note that for 77 = we simply obtain 



[u — u) dx 



< a. 



This estimate is obviously not sharp if a — > 1, since u—u dx = 0, but gives a worst case estimate 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Figure 3: The function u(x) = x — 1/2 (solid red line) and the solution of (2.4) for input data 



given in terms of / = u and a — 1/18 (dashed blue line). The areas indicated by wavy lines equal 
— 1/18 and 1/18, respectively. 

if a is far away from the boundary as in case of a = 1/2. If we consider e.g. u(x) = x — 1/2, the 
estimate guarantees that the area visualized in Figure|3]is equal or smaller than a. Indeed the area 
equals a, which becomes clear by computing the exact solution of (2.4) for fix) — u(x) — x — 1/2. 
For < a < 1/8 the solution satisfies 



'2a — g x 



e [o,v / 2^[ 



u(x) 



e [V2a, 1 - V2a\ 



f2a. x G ] 1 - y/2a, 1] 



and thus, {u — u) dx = — jl/ 2 {u — u) dx = a holds true. 
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Example 9. As a second example we want to focus on a scale estimate on the scale [a, b) C [0, 1], 
for < a < b < 1. Again we study (2.4) and, in analogy to the previous example, consider 



b pi 

ii — u dx = I (cqMo + c\U\ + c 2 u 2 ) (ii — u) dx , 



(4.9) 



with uq(x) = 1, u\(x) := u a (x) and w 2 (:c) := — u (x), for u a and u b being singular vectors as 
defined in Example [6] Thus, we can rewrite (4.9| to 



(u — u) dx — I (cqUq + c\U\ + C2U2) (it — u) dx + 

(c uo + c\U\ + C2U2) {ii — u) dx + 

(cqUq + C1U1 + C2U2) (u - u) dx 
'1-a II -b 



Co 



Co 



Co 



-ci 



c 2 



1-a 



Cl 



1 - 6 



c 2 



(u — u) dx + 



(u — u) dx + 



c 2 / (u — u) dx . 



V 1 - b 

Similar to the previous example we therefore have to make sure that Co, c\ and c 2 satisfy 

(l 



b I 



1-a 

' a 
1-a 



CO 
Cl 

c 2 



(4.10) 



Solving (4.10) for c , c x and c 2 yields Co = b — a, c x = a (1 — a) and c 2 = (1 — 6). Conse- 
quently, Theorem [3] allows us to compute the estimate 



f b 






1 (it — u) dx 


<(b-a) 


L 


J a 







— u dx 



+ y/a(l-a) 



u\ (it — u) dx 



+ ^b(l-b) 



U2 (it — it) dx 



<2a+ y/a(l - a) 



i] u\ dx 



+ y/b{l-b) 



Tj U2 dx 



In case of clean data, i.e. 77 = 0, we see that 

fb 



(it — it) dx 



< 2a 



holds. It is remarkable that the worst-case estimate on the scale [a, b] does not depend on the 
boundary values a and 6. However, if we want to estimate the mean value, then an additional 
factor 53- comes up, which increases with decreasing size of the interval. 
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5 Exact Reconstruction of Singular Vectors 

In this section we discuss exact solutions of variational regularization methods, namely the recovery 
of singular vectors. We will relate this results to properties of Bregman distances. First of all, 
we want to recall a criterion that has been shown by Meyer in [52 in order to derive trivial 
ground states for the ROF-model. These considerations can be generalized by the use of Bregman 
distances as it can be seen by the following theorem. 



Theorem 4. Let f be such that 

1 



K*f e dJ(0) (5.1) 
a 



is satisfied, then a minimizer of (2.2) is given by u = 0. Vice versa, u = is a minimizer of \2.2 
only if (5.1 1 holds. 



Proof. We can rewrite (2.2) to 



1 ..a i t / \ /I,,,, \ \ , 1 „,„a 



ue argmin <^ - \\Ku\\ n + a [J(u) - ( -K*f,u ) + - \\f\ m 

«Sdom(J) I Z \ \ " lull 



Since (5.1 ) is satisfied, we can define q := {K* f)/a such that 

D g j(u,0) = J(u) - J(0) - (q,u) u 
is a non-negative Bregman distance. Hence, ignoring the constant part 1/2||/||-^ we have 

f i 2 
u 6 argmin < - + aDj(u, 0) 

tiGdom(J) 12 

for which the obvious minimizer is given via u — 0, since both terms are non-negative and vanish 
for u — 0. It is straightforward to see the opposite condition from the optimality condition for 
u = 0. □ 



Remark 2. Note that if ( J5 . 1 [ ) is satisfied for a specific a, then (5.1) is automatically guaranteed 
for every a > a, since (K*f)/a e dJ(0) implies 

J(v) > (lK*f,v 



u 

for all v G dom( J). If we multiply both sides of the inequality with a we obtain 

&J(v) > (K*f,v) u , 
since a is positive. Due to the positivity of J we even have 

aJ{v) > aJ(v) 

for all v € dom( J) and a > a, and hence, ( |5.1[ ) is guaranteed for all a > a. 

Theorem |4] yields an explicit condition on the regularization parameter a to enforce the solution 
of (2.2) to be zero. Furthermore, according to the following Lemma for singular vectors u\ of one- 
homogeneous functionals there even have to exist parameters a such that (5.1) is fulfilled for 
/ = Ku x . 



Lemma 3. Let J be one-homogeneous. If u G dom(J) n dom(i"T) is a function such that (5.1) 
does not hold for any a € K>o with data f = Ku, i.e. 

-K*Ku 4 dJ(Q) Va G M >0 . 
a 

then, u is not a singular vector with singular value A ^ 0. 
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Proof. We want to prove the statement by contradiction. We therefore assume that on the one 
hand, u is a singular vector with singular value A, i.e. XK*Ku =p€ dJ(u). Taking a duality 
product of this relation with u yields the equality 



X\\Kuf H = J(u), 



(5.2) 



due to the one-homogeneity of J. Moreover, from the definition of the subdifferential, the singular 
value property yields 



J(v) > J(u) + A (K*Ku, v - u) u Mv e dom(J) . 



(5.3) 



On the other hand, we know due to (K*Ku) /a ^ J(0) for all a € K >0 that there has to exist a 
function v £ dom( J) with 



(Ku, Kv) u > aJ(v) 



(5.4) 



If we insert (5.3 1 into (5.4), for the particular choice of v we therefore obtain 



(Ku, Kv) H > a (j(u) + A (Ku, Kv) H 
(1 - Xa) (Ku, Kv) H > a (J(u) - A||#u||^) . 



X\\Kuf u ) 



(5.5) 



Equation (5.5) is supposed to be true for every a £ K>0j especially for the particular choice 



a = 1/A. In this case, (5.5) reads as 

X\\Ku\\ 2 H > J{u) , 
for A > 0, and therefore is a contradiction to (|5.2| 



□ 

The equivalent reverse statement of Lemma [3] is that for every data / created by a singular 
vector u\, i.e. / = Ku\, there exists a parameter a such that (5.1 1 is valid for a > a. Moreover, 



condition (5.1) guarantees that the data / needs to satisfy certain properties in order to vanish 



for a large regularization parameter a, e.g. / does need to have zero mean for K = I in the case 
of TV-regularization. 

Remark 3. Note that, however, it is possible for a particular function / that there does not exist 



a parameter a such that (5.1 ) is fulfilled, although / = Kuq is given in terms of a trivial ground 



state (and therefore in terms of a singular vector) with singular value A = 0. A simple example 
would be that / ^ is a constant function in case of K = I and J = TV. 



5.1 Clean Data 

In case of clean data / = "fKu\, 7 > and u\ being a non-trivial singular vector, we are interested 



in finding a solution of (2.2 ) that can be expressed in terms of this singular vector, i.e. u = cu for 



a positive constant c. We want to call such a function almost exact solution. 

The following theorem gives us the conditions on a needed for recovering a multiple of u\. 

Theorem 5. Let J be one-homogeneous. Furthermore, let u\ be a singular vector with corre- 
sponding singular value X. Then, if the data f is given as f = ^Ku\ for a positive constant 7, a 



solution of (2.2) is u — cu\ for 



7-aA, 



if 7 > aX is satisfied. 



IS 



Proof. Again, we rewrite (2.2) in terms of a Bregman distance. Inserting / = ^Ku x yields 



1 2 
u G arg min ^ — \\Ku — jKuxWu + ctJ{u) 



utEdom( J) 

arg min 

ti£dom(J) 



= arg min ^ — \\Ku — cKux\\^i + aJ(u) + aJ(cu\) — — (XK*Ku Xl u) u 
+ i ((-fKu x ,-fKu x ) n + (cKu\,cKu\) n ) - aJ{cu x ) 



By ignoring the constant part, for 7 > aA and = 7 — aX > we therefore obtain 
u = arg min j - \\Ku — cK «a||^ + aDj(u, cu\) \ , 

with 

\ t^-m ^ / \ J onc-homogencous T/ >. 

q = XK*Ku x E dJ(u x ) = b dJ{cu x ) . 

An obvious minimizer is u = cu x . □ 
Note that the above result does not yield that the singular value is the unique minimizer, 



except for K having trivial nullspace. To see this, let us consider model (2.2) with J(u) — \\u\\ £ i 
and K being the matrix 

1 

It is obvious that K is normalized with respect to the ^ 2 -norm, but neither is injective nor surjec- 
tive. Due to Example [4] both e± and e 2 are singular vectors. However, both yield the same output 
/ = (l/\/2, l/\/2) T and therefore both u = (1 — a)ei and u = (1 — a)e 2 satisfy TheoremJHJ 

We also mention that the main line of Theorem [5] also holds for p-homogeneous functionals, 
but with different constants c. In the following we turn our attention to noisy data, where the one- 
homogeneity is much more essential, e.g. exact reconstruction for a wide class of noise realizations 
cannot hold for quadratic regularizations. 

5.2 Noisy Data 

The multivaluedness of the subdiffcrential dJ allows to obtain almost exact solutions even in 
the presence of noisy data, i.e. / = "fKu x + n, though the case of noisy data is slightly more 
complicated to prove. If the most significant features of u x with respect to the regularization 
energy J are left unaffected by the noise, then the following theorem guarantees almost exact 
recovery of the singular vector u x . 

Theorem 6. Let J be one-homogeneous. Furthermore, let u x be a singular vector with corre- 
sponding singular value X. The data f is assumed to be corrupted by noise n, i.e. f — jKu x + n 
for a positive constant 7, such that there exist positive constants fi and r\ with 

LiK*Ku x + r,K*nedJ(u x ) . (5.6) 



Then, a solution of (2.2) is given by u = cu x for 

\ 1 X ~l l 

c = 7 — a\ H 

V 

if 7 satisfies the SNR-condition 



!>-, (5-7) 
77 



and if a £ [I/77, 7/A + l/^[ holds. 
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Proof. Similar to the proof of Theorem [5] we rewrite (2.2 1 to 

fl 2 
u = argmin ^ - \\Ku — jKu x — n\\^ + aJ(u) 



titEdom( J) 

argmin -l - \\Ku — cKu x \^ u + aJ(u) — a ( K* Ku\ H K*n, u 

uGdom(J) 1 2 \ Oi a 

2 x^nf 



u£dom(J) 



2 
1 



argmin < - ||-K"tt — c.Kii,\|| w + aD q J (u 1 cu\) > , 



with obvious minimizer u = cua, if we neglect the constant parts and if we can manage to choose 
c such that 

^^K*Ku x + -K*n £ dJ{u x ) = dJ(cu x ) . 
a a 



Note that since dJ(u x ) is a convex set not only XK*Ku\ and (5.6 1 are elements of dJ(u x ), but 
also any convex combination, i.e. 

((1 - 0) A + fa) K*Ku x + @r)K*n £ d,J(u x ) , 

for each /3 G [0, 1]. 

Hence, we need to choose c > and (3 £ [0, 1] such that 1/a — /3rj and (7— c)/a = (1 — /3)A+/3(i. 
Therefore, solutions for /3 and c are 

and 

c = 7 — aA H . 

In order to satisfy /3 < 1 and c > 0, a has to be chosen such that a is bounded via 

1 7 1 a 

7/ A r) Ar/ 

This condition can only be satisfied, if 7 > /1/77 holds. □ 



At a first glance (5.6) seems unreasonably restrictive, e.g. for smooth functionals J it can 
only hold if the noise is generated by the singular vector itself. This however differs completely in 
the situation of singular regularization functionals with large subdifferentials. To see this, let us 
consider the setup of Example [4] with a ground state e^, together with the reasonable assumptions 
that no pair of columns in K is linearly independent. For 

p = \K*Ke t £ e?||ei||^(RN) 

we have 

Pi = P ■ e< = Aej ■ K*Ke t = A = 1 

and hence for j =/= i 

\ P -e 3 \ = \e r K*Kei\ = \K r K t \ < 1, 

where we denote by Kj the j-th column of K. Now let n £ K M be the noise vector and v — K*n. 
In order to satisfy (5.6) we need 

1 = p ■ a = /xei • K*Kei +Tjei-v = fi + 7717 
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and 

1 > \p- e.j\ = \^Kj ■ Ki + rjVj\. 
The first condition is satisfied if ji = 1 — r\Vi . Since then 

\ i iK r K i + V v j \ = \K j -K i \+0{ij) 



for r\ small and \Kj ■ Ki\ < 1, we indeed conclude that there always exists r\ such that (5.6) is 
satisfied. Note that in this case A = 1 and hence c = 7 — a + 1)4. 

6 Bias of Variational Methods 

We have seen in the previous section that there remains a small bias in the exact reconstruction 
of singular vectors, which is incorporated in the fact that c < 7, with difference depending on the 
actual singular value. For / = jKu\, this difference yields a residual 

\\Ku - f\\u = a\\\Ku\\\ H = aX, 

i.e. a bias in the solution, which is minimal for the ground state Ao- In this short section we will 
show that indeed this bias prevails for arbitrary data and the residual is bounded below by aAo, 
which again confirms the extremal role of the ground state. 

Theorem 7. Let J be one-homogeneous, f G % be arbitrary, a > 0, and let 

u a € argmin ( ^-\\Ku — /||^ + aJ(u) 
u&A V 2 

Then 

||^ a ||«<max{||/||«- a A o ,0}, (6.1) 
where Ao is the smallest singular value. As a direct consequence, if \\f\\u > aAo, then 

\\Ku a -f\\ n >aA , (6.2) 

which is sharp if u a is a multiple of the ground state uq . 

Proof. If Ku a = 0, then the estimate is obviously satisfied. Thus, we restrict our attention to the 
case Ku a 7^ and define v := ir^aTi ■ From the dual product of the optimality condition with v 
we see that for a subgradient p a £ dJ(u a ) — dJ(v) that 

(K*Ku a , v) u + a(p a ,v) u = (K*f, v) u . 

Due to the one-homogeneity of J we conclude (p a ,v}u = J(v) and hence, 

\\Ku a \\ n + aJ(v) - (f,Kv) n . 

By the definition of the ground state we conclude J(v) > Ao and by further estimating the right- 
hand side via the Cauchy- Schwartz inequality (f,Ku a )^ < \\f\\fi\\Ku a \\^ we have 

\\Ku a \\ n +aX < \\f\\ H , 

which implies the assertion. □ 

The bias in the residual can to some extent also be translated to the regularization functional, 
as the following result shows: 
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Theorem 8. Let J be one-homogeneous, f — Ku for u S U with J(u) < oo, such that \\f\\-H > 
aAo- Moreover let a > and 



Then 



u a G argmin ( \\\Ku - /||^ + aj(u) 



a 



J{u a )<J{u)-^\l, (6.3) 



with Xq denoting the smallest singular value. 



Proof. By the definition of u a as a minimizer and (6.2) we conclude 

\cx 2 Xl + aJ(u a ) < ^\\Ku a - f\\ 2 n + aJ{u a ) < aJ(u), 
which yields the assertion after dividing by a. □ 



7 Unbiased Recovery and Inverse Scale Space 

In Section [5] we have seen that in standard variational methods of the form ( |2.2[ ) one can recover 
singular vector (almost) exactly with a loss of contrast. In this chapter we want to extend this 
topic to the question of exact recovery without a loss of contrast, in the absence and presence of 
noise. For this sake we are going to investigate the concept of the inverse scale space flows, which 



have displayed superior properties to solutions of (2.2) in several numerical tests (cf. [T!?l I5U] ) 



Inverse scale space methods can be derived asymptotically from the Bregman iteration 

u k+1 e argmin {\ \\Ku- f\\ 2 H + a(J(u) - (p k ,u) u )\ , (7.1) 

uedom(J) I 1 J 

for which the subgradient p k £ dJ(u k ) satisfies p° = and 

p k =p k ~ l + -K*(f - Ku k ). (7.2) 

a 

In the limit a — > oo one can interpret At = ^ as a time step tending to zero. Thus, we obtain the 
inverse scale space flow 

d tP (t)=K*(f-Ku(t)), P (t)edJ(u(t)). (7.3) 

We refer to [THl EH HI 1201 US 1221 [S3] for detailed discussions of the inverse scale space method 
and its analysis. 

We want to mention that analogous results on exact respectively unbiased reconstruction can 
be obtained for the Bregman iteration, clearly with some dependence on the value of a, further 
details can be found in 10]. 

7.1 Clean Data 

Similar to Section [O] we are going to consider data / = ^Kux, with u\ being a singular vector. 
For this setup we are able to derive the following result: 

Theorem 9. Let J be one-homogeneous and let u\ be a singular vector with corresponding singular 
value A. Then, if the data f are given by f = "/Ku\ for a positive constant 7, a solution of the 



inverse scale space flow ( 7.3 1 is given by 



«(*) = •{ tlV: (7.4) 

1U\ if t > f* 



for t* = A/7. 
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Proof. First of all we see with Theorem [4] that for t < t* we have 



p(t) = tK*f = t 7 K*Ku x = t^ Px e dJ(0). 



Since dtp = K*f and p(0) = 0, = is a solution of (7.3 1. 

For time t > t„ a continuous extension of p is given by the constant p(t) = p(t x ) and u(i) = 
u(t t ). Due to the one-homogeneity, with t„ — A/7 we obtain 

P (u) e aj( UA ) = 9j( 7 wa), - = . 

Thus, d t p = yields indeed a solution of (|7.3| for < > i*. □ 



7.2 Noisy Data 

As in Section [5] the case of noisy data is a bit more complicated. In order to recover a singular 
vector exactly despite the contamination of the data / with noise, we basically need the signal 
ratio /i as introduced in Theorem [6] to equal the singular value A. We also mention that the 
stopping time is the regularization parameter in inverse scale space methods, thus we can only 
expect the exact reconstruction to happen in a time interval (£*,£**). More precisely we obtain 
the following result: 

Theorem 10. Let J be one-homogeneous and let u\ be a singular vector with corresponding 
singular value A. The data f is assumed to be corrupted by noise n, i.e. f — jKu\ + n for a 



positive constant 7, such that there exist positive constants /! and r\ that satisfy (5.6 I and (5.7) 



Then, a solution of the inverse scale space flow (7.3) is given by 



for 



and 



u(*)H ° ttV: > (7-5) 
1 cu x ift*<t<t** y ' 



7+^- (7.6) 
V 



A + TV — fi 

Proof. With a similar argumentation as in the proof of Theorem [9] we obtain u(t) = for t < t* 
and 

p{U) = UjK*Ku x + tJCn 

as the corresponding subgradient to the first non-zero u for a critical time t*. Analogous to the 
proof of Theorem [6] we can treat the relation above as a convex combination of \K*K and (5.6) 



for any (3 G [0, 1], and determine (3 = A+7 ^_ M and subsequently as above. Moreover, we see 
that u(t*) — cu\ is a feasible solution with subgradient p{t*), which we extrapolate as constant 
for further times up to some time i**. Then, from p(t*) = t^K* f and dtp(t) — K*(f — cKu\) we 
conclude 

p{t) = tK*f -{t- U)cK*Ku x 

= tK*(jKu x +n)-(t- QcK*Ku\ 
= tK*n + (jt - c(t - U))K*Ku x . 



To obtain p(t) € dJ(u(t*)) we again compare convex combinations of (5.6) and XK* Ku\ with 



parameter f3 e [0, 1]. We need to choose j3 — * , which is only possible for t < t** = 77. Further we 
obtain that /i = jt — c(t — t*) needs to hold. This identity can be verified with the above formulas 
for i„ and c. □ 
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8 Further Examples 



In the following we shall discuss several further examples to illustrate the use of nonlinear singular 
values: 



8.1 Hilbert Space Norms 

The obvious first starting point is to consider regularizations with Hilbert space norms, i.e. 

J(u) = \\u\\ u . (8.1) 

Note that we focus on the one-homogeneous case here, i.e. we do not use the squared Hilbert 
space norm as in the standard formulation of Tikhonov regularization. However, it is easy to 
check that there is a one-to-one relation between the regularization parameters of the problems 
with squared and non-squared norms, such that they are equivalent. It is straight-forward to see 
that the singular values are determined from 

u 

XK*Ku = , (8.2) 

\m\u 

and with our normalization we see A = Thus if u n is a singular vector in the classical 

definition 

K*Ku n = a 2 n u n , (8.3) 

it is also a singular vector in the new definition (at least after appropriate normalization). The 
linear singular values a n are related to the novel values A„ via 

°l = , I, = i, (8-4) 

which is consistent with our original definition as singular values being related to the regularization 
functional rather than to K. 

The fact that singular values yield exact solutions of the variational problem is not new and 
is directly inferred as a special case of the standard theory (cf. |36j). However, it is surprising 
how the behavior of the inverse scale space method changes when rescaling from the squared norm 

equivalent to Showalter's method (cf. [62] ) 



to the one-homogeneous case. In the case of J(u) — the inverse scale space method is 



d t u = K*(f-Ku), 

and it is well-known that singular vectors follow an exponential dynamic, i.e. if / = Ku\ for a 
singular vector u\, then 

u(f) = (1 - e-'/> A 

in case of K = I. The behavior changes completely in the one-homogeneous case as we can 
conclude from the results in the previous section, since the solution remains zero in finite time 
and then jumps exactly to u{t) = u\ at the critical time. 



8.2 Total Variation 

We have already used the ROF model for denoising at several instances as a simple illustrative 
example, in particular in spatial dimension one. In this section we want to extend the consid- 
erations of the one dimensional ROF model to data that is corrupted by noise. Moreover, we 
want to highlight the connection between singular vectors and characteristic functions of so-called 
calibrable sets as introduced in [4] , with respect to the isotropic total variation functional in 
higher spatial dimension. In [27J the theory of calibrable sets has also been extended to more 
general (and in particular anisotropic) regularization functionals, which we disregard here for the 
sake of simplicity. Instead, we introduce an analytical solution of the anisotropic total variation 
regularization in terms of the singular vector definition , similar to the previous examples in spatial 
dimension one. 
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Unbiased Recovery in Practice 



In this section we briefly want to illustrate that in practice the violation of the assumptions of the 
Theorems [9] and [TT)| can indeed yield undesired artifacts in the reconstruction. We therefore want 
to investigate the piece- wise constant function u 4 : [0, 1] — > { — 1, 1} 

!*(*):= . (8.5) 

It is straightforward to see that u 4 is a singular vector of TV with singular value A = 4, and 
with the corresponding dual singular vector p 4 satisfying the relation p 4 = q 4 (in terms of a weak 
derivative) for 174 defined as 

( x ze[o,i[ 

qi (x):=dx-\ xe [i,f] . (8.6) 

[l-x ie]|,i] 



Both functions are visualized in FigureEj Now assume we want to compute the minimizer of (2.4 1 




Figure 4: The functions U4 (solid blue line) and q 4 (dashed red line) as defined in (8.5) and (8.6), 
respectively. 

for our data given in terms of f(x) = Ui{x) + n(x). Here, n represents a noise function which we 
assume to have mean zero (i.e. J n(x) dx — 0), and to fulfill N(Q) — N(l), with N denoting 
the primitive of n (i.e. N'(x) = n(x)). In order to apply Theorem [6] we need to guarantee the 
existence of constants /i > and r\ > 1/a such that (5.6) is satisfied. We therefore make the 
attempt to define 



q{x) 



-q 4 (x) + r)N(x) , 



for which we obtain q(0) — q(l) — 0, due to the definition of n. Moreover, we discover 



(<7' ; '"4) L 2 ([oa]) = H + 2i] / n(x) dx 



N 



N 
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which equals TV(?i4) — 4 if /i satisfies 

[i = A - 2i] [ N 



-N 



Assume iV(3/4) > 7V(l/4), then we even obtain 

3 
4 



/x<4- - [N 



- N 



< 4 = A. 



(8.7) 



due to r] > 1/a. Note that in order to obtain \x > we need to ensure 



a > - | A 



AT 



otherwise Theorem [6] cannot be applied. Assuming to choose ry = 1/a, the loss of contrast modifies 
to 



c=l-4a + 2[N 



N 



in case that (8.8 1 does hold. 



(a) a 



(b)a-. 



(c) a 



0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 



0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 



(d) Closeup, a = =±- 



(e) Closeup, a = ^ 



Figure 5: Computational ROF- reconstructions for input datum / 
defined as in 



U4 + n, with it4 and n being 
respectively. It is remarkable to see that as soon as a is chosen such 



5| and ( [879 ) 

that Theorem [6] cannot be applied, the numerical computations fail to compute a multiple of U4, 
indicating the sharpness of the Theorem. 
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Let us consider a specific example now. We decide to choose the periodic function 



n(x) -~ Acos(38nx) (8.9) 

to be our noise function, with amplitude A and frequency 38. Note that this noise function 
satisfies the mean zero property as well as N(0) — N(l), for N(x) — (A sin (387ra;))/(387r). 
Moreover, we compute 2V(3/4) = A/(38tt) and N(l/i) = -A/(387r), so that ((8T7J) now reads 
as jU < 4 (1 — A/(38air)), and according to (8.8), a should be chosen to satisfy 

A 

a > . 

38tt 

In Figure [5] you can see several computational solutions of (2.4) for the specified input datum 
/ = «4 + n, for A — 1/2 and numerous a-values. The computations nicely indicate that as 
soon as the assumptions of Theorem [6] are violated, artifacts are introduced in the computational 
reconstruction. 

In the unbiased case of Theorem [10] we may conclude from the considerations above that we 
obtain u(t) — 114 as the solution of the Inverse Scale Space Flow (7.3), for 4 < t < (38tt)/A and 
A < (19tt)/2. 



The ROF Model and Calibrable Sets 

In [35] Y. Meyer has basically proved that the characteristic function of a circle is a singular vector 
of isotropic total variation on M. N . In H] the class of characteristic functions that correspond 
to singular vectors in the terminology of this paper has been extended to calibrable sets. In the 
following we want to recall properties of calibrable sets and show why they correspond to singular 
vectors of total variation. 

Definition 5. Let CcM 2 be a bounded, convex and connected set with its boundary dC being of 
class C ' . Then C is called calibrable if there exists a vector field £ € L°°(C;R 2 ) with ||£||oo < 1 
such that 

P(C) 

— div £ = const = \a ■= , —. in C 

\C\ (8.10) 

£ • v = — 1 a. e. on dC 

holds, for v denoting the outer unit normal to dC , P{C) denoting the perimeter and with \C\ 
representing the volume of C . 

Remark 4. Note that the perimeter simply equals the isotropic total variation TV(xc) of the 
corresponding characteristic function xc- 

Theorem 11. In J/fl/ it has been proved that for calibrable sets the following conditions are satisfied. 

• C is the solution of the problem 

minP(X)-ALY|. 

• The inequality 

esssupK(x) < A 

xedc 

holds, with k(x) denoting the curvature of dC at point x. 

In [3] it has already been proved that for calibrable sets C the solution of the isotropic total 
variation with an characteristic function \c a s the input datum satisfies u(x) = (1 — \a)xc{x)- 
Thus, Theorem [5] implies that the input datum already has to be a singular vector with singular 
value A. 

We finally mention that Agueh and Carlier [3] have investigated a related class of ground state 
problems for total variation with constraints of the form J n G(\u\) dx = 1. Such may be interesting 
for denoising with noise models different from additive Gaussian, where hardly any examples of 
exact solutions exist, except for single ones in [10] . 
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The Anisotropic ROF Model in Two Dimensions 

Analytical solutions of the isotropic ROF model have widely been studied and discussed in liter- 
ature. However, analytical solutions of the anisotropic ROF model have not attracted a similar 
attention, although many of them are easier to describe, as we are going to see in the following. 





0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



(a) On-top view of u 



(b) 3D view of u 




o o 



o o 



(c) 3D views of q x , — and q y , — 



Figure 6: On-top and three-dimensional view of the singular vector and its dual variables 



Recall (8.5) and its dual singular vector p± — q^, with #4 being defined in (8.6). Let us define 



the two-dimensional functions q^^ : [Oj I] 2 — ^ [ — 1) 1] an d <?f/^ : [0, l] 2 — > [— 1, 1] with q*/^(x, y) '■= 



\/32 



q^x) and q v ^ '■= q±{y)- Then it is easy to see that the weak divergence of q x 



:S2 



/32 v 
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divided by v32 yields the function 

{1 ( X)J/ ) e [I ; |] 
(i-|r<!)A(| 2 /-I|>|))v((|^-i|>i)A(|j/-i|<|)) . 
— 1 else 

(8.11) 

Thus, with the same considerations as before we are able to prove that is a singular vector 
of TV with singular value A = \/32, since we observe 



• q x / ^n x = and q v . — n y — 0, with n x and n y denoting the outer unit normals of q X p^ and 
in x- and in y-direction, respectively 



/32 



v 32 



• lklU~([o,i] 2 ;K 2 ) = mt 

• ( div 9y32.«y32) L 2 ([04] 2. R 2 ) =TV(uy32) 

The singular vector an d t ne dual vectors q x / ^ and are visualized in Figure |6j 
Denoising Vector Fields 

Another generalization of the one-dimensional ROF model to multiple dimensions has been dis- 
cussed recently in [15] . namely the denoising of vector fields / £ L 2 (fl;M. n ) via minimizing 

u = argmin J~ f (/ — u) 2 dx + f |div u| dx\ . (8-12) 

As in the case of total variation, the L 1 -norm of the divergence has to be generalized to a weak 
form 



J(u) — sup / u ■ S7tp dx. (8.13) 

l|vlU»(n)<l 

Concerning our investigation of ground states and singular values, this model yields an example 
with a huge set of trivial ground states. Any function u £ L 2 (il; R") such that V-u = is obviously 
a ground state. In order to compute a nontrivial ground state we obtain the condition 



u ■ v dx = for all v with V • v = 0, 
and that u has to be a gradient field, i.e. 

Am = . 

The scalar q is obtained from the minimization of 

/ \Aq\ dx subject to / HVgH^/gn) dx = 1 . 

8.3 Support Pursuit 

While sparsity regularization with discrete ^ 1 -functionals has been studied extensively in the last 
decade, the continuum analog was investigated only recently. At a first glance it seems that L 1 (i7) 
would be the straight-forward extension in terms of function spaces, but similar to the case of 
total variation the lack of a weak-star topology in L 1 (0) prevents the applicability and often the 
existence of minimizers. Again the solution is to extend the variational approach to a slightly 
larger space, which is a dual space. In this case this dual space is the space of Radon measures 
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A4(Q), which is the dual space of Cq(Q). The appropriate regularization functional as introduced 
in [T3] is the zero-order total variation of a measure fj, G i.e. 

J(n) = sup / <p dfJL. (8-14) 

¥>£C (n) J a 

llvlloo<l 

The setup used in [T3] is to choose K — L* , with L :'H —> Co(f2) being a bounded linear operator. 
This allows to avoid working with the complicated dual space of A4(£l) for some arguments. We 
note that in this setup regular singular vectors can rather be obtained from 

ALL* [j, € dJ(p) . (8.15) 

In [3D] this problem was analyzed in a compressed sensing setting, when K consists of a finite 
number of forward projections (in a polynomial basis), so that L can be written down explicitly. 



Bounded Operators 

Here we will consider two related cases including most practical examples we are aware of (except 
the Radon integral transform): First of all we analyze integral operators K^ : A4(n) — > L 2 (T,) of 
the form 

(K 0o u)(x)= { k(x,y) d(i(y) (8.16) 



with a continuous kernel k. Then we shall turn to projection operators with M measurements 
K M :M(Q)^ R m of the form 



(K M u) = / k 3 {x) dn{x) , (8.17) 

V/" / j=l,...,JVf 

assuming again the kj to be continuous. Notice that K M can also be thought of as a semidis- 
cretization of the operator K^, e.g. by collocation methods, so it is natural to compare at least 
ground states for those operators. 

Our aim is to verify a natural extension of the ^-case, where the ground state is a vector with 
a single nonzero entry. The natural analogue in the space of Radon measures is a concentrated 
measure 8 X for x € f2 (we use the notation S x due to the relation to the Dirac delta distribution), 
with 

r <p{y) dS x = <p{x) e C (fi) • (8-18) 



Indeed we can show that ground states are of this form and give a reasonably simple condition on 
their location. 

Theorem 12. Let and Km be as above. Then 

• A ground state /xp" of K^ is given by /xp" = c8 z , with z satisfying 



k(x, z) 2 dx > k(x, y) 2 dx V y G Q . 



and with c fulfilling 



\ oc 
A 



<\J J Q k(x, z) 2 dx 

A ground state ^ of Km is given by /Iq 1 = cS z , with z satisfying 



so that c fulfills 



r — \°° — 
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Proof. We prove a slightly more general version: for an operator K : A4(ft) —¥ H, = c &z is a 
ground state if 

f(y) := \\K5 y \\n 

attains a maximum at z. This yields the statements of the theorem as special cases. 
Define c = A = ^ K g ^ . Then we see that 

(XK*Kfi Q , [M))m(si) = c = J(po)- 
Moreover, for any fi e A4(Cl) we have 

(\K*Kf, ,ri M{n) = \{K5 Z ,K^) H < \\\Kn\\ n . 
We thus verify that A 2 ||i^/u||^ < </(^) 2 , or equivalently 

\\Kf,\\ 2 H < J(») 2 \\K6 Z \\ 2 H 

holds. First of all we discover 

WK^fn = {^K*K^) M{n) < J^WICKnW^ . 
For y £ f2 we further estimate 

\(K*Kfi)(y)\ - \(K6 v ,Kn) H \ < \\KS y \\ n \\K»\\ n , 

and thus obtain 

\\K* K v\\oo= sup \(K*Kri(y)\< sup f(y)\\Kn\\ H , 

y y 

which finally yields the assertion. □ 



An Unbounded Operator 

As a simple sketch of the ROF model in three dimensions (when embedding of BV into L 2 fails), we 
study the case of K*K being the inverse Laplacian, i.e. Ku — v, with v e Hq(Q) solving —Av = u 
if u e n and for denoting the unit ball in M 3 . Since a Dirac delta distribution is 

not an element of 77 _1 (f2) in spatial dimension three, we cannot obtain such measures as singular 
vectors, hence the latter can at most be concentrated on manifolds with higher dimension. 
The equation for the singular vector becomes 

X/j, = —Ap, p € dJ(/j,) 

and we can look for radially symmetric solutions /j, — M(r), p = P(r). Hence 

Xr 2 M = -d r {r 2 d r P) 

with the additional condition P(l) = 0. Now a canoncial measure concentrated on a codimension 
one manifold is the one on a sphere with radius R e (0,1), which corresponds to M being a 
concentrated measure in r = R, i.e. M = cSr with c to be determined from the normalization 
condition 

[ P{r) 2 r 2 dr = l. 
Jo 

Then P can be computed as 

P(r) = f Ac^(l-r) iir>R . . 

( > \ \cR(l-R) tfr<R [ ' 

Now P attains a maximum at r — R, and we need to choose A such that P(R) = 1 holds, which 
yields A = cR ri_ R \ ■ Hence, we conclude that a measure concentrated on a sphere of radius R is a 

singular vector, the smallest singular value in this class is obtained for R = Note that A — > oo 
for R — > 1 or R — > 0, which confirms that neither a concentrated measure in the origin nor a 
measure concentrated on <9£1 is a singular vector. 
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8.4 Sparsity and Variants 

As mentioned above sparsity-enforcing regularizations played an important role in image analysis 
and inverse problems in the last years. The usual setup in £ 1 -Regularization is U — £ 1 (M n ) and 
H = £ 2 (M. m ), the forward operator can thus be identified with a matrix K G M. mxn . The proof of 
Theorem 12 immediately implies that a ground state is given by 7^, where i G {1, . . . , n} is the 



index of a row with maximal Euclidean norm, i.e. H-Ke^la > ll-Kfijlh f° r an y j € {1, . . . , n}. In 
the following we discuss two other relevant examples related to sparsity and their ground states, 
respectively singular vectors. 

Low Rank 

In order to compute matrix- valued solutions of low rank, the nuclear norm has been considered in 
various papers recently, respectively shown to be an exact relaxation of minimal rank problems in 
some cases (cf. [37J[5S|). In this case U = M. mxn and H = R M with 



min{m,n} 

J(u)= a i( u )- 



.20) 



for which <jj are the (classical) singular values of the matrix u, and with M <C (m x n) denoting 
the number of known entries of u. 

It seems natural that a ground state is of rank one. To see this, we take an arbitrary matrix 
u with singular value decomposition 

min{m,n} 



Then we have 

\\Ku\\ n = 



min{m,n} 
3=1 



min{m,n} 

3=1 



Equality is obtained if u is a rank-one matrix. Thus, we conclude that the ground state is of 
rank one and obviously it is a multiple of UV T , where U and V maximize || KUV T \\u among all 
orthogonal matrices. 



Joint Sparsity 

In some applications it is more reasonable that few groups of variables have nonzero entries instead 
of just a few single variables being nonzero. This is modeled by so-called joint sparsity or group 
lasso approaches (cf. [S7J [73J [ST]), the most prominent example being 



\ 



.21) 



in U 



and % 



where N = 



Since the goal is to obtain group sparsity, we 



expect solutions such that u,. vanishes for most indices i. In particular one expects the ground 
state such that Uj. is a nonzero vector for only a single index i. 



In order to characterize the ground state we introduce the matrices Ai G 



iMxi 



the linear operator K restricted to elements u supported in the index set {«} x {1, . 
shall also use the notation Ui = (uij)j=i,...,m, G I 



representing 
,7ij}. We 



With those we can write 



j(u) = J2\\u>\\m 



\Ku\ 
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By the triangle inequality 



\\Ku\\ e(K M } < WAiUiW^M) < X>r"IMI*(R"*) < J{u) . max < ax , (8.22) 

i=l i=l — 

for which trf lax is the largest singular value of Ai . Equality is obtained if Uk is the singular vector 
corresponding to singular value <r™ ax with 

^max > ^max y • ^ fc 

and all other L/^ equal zero. Thus, there are ground states with only one row different from zero, 
which perfectly corresponds to the motivation of group sparsity. The pattern of the nonzero row 
is a classical singular vector of the restricted matrix Ai. 



8.5 Infimal-Convolution Regularization 

Due to deficiencies of standard regularization functionals, constructions like infimal convolution of 
multiple regularization functionals have been considered recently (cf. [2S [3 |65l |6TJ [13] ) . The 
infimal convolution (inf-convolution) of two convex functionals J\ and J 2 is defined as 

J{u) := inf (Ji(v) + J 2 (u-v)), (8.23) 

and appears to be a good way to combine the advantages of different regularization functionals. 
Ideally one would hope that the inf-convolution of J\ and J 2 can lead to exact reconstruction of 
all solutions that are reconstructed exactly with J\ or J 2 . 
Since the related variational problem can be reformulated as 

(v,w) = &igmm\h\K{v + w) - f\\ 2 + aiMv) + J 2 (w))\ , (8.24) 

v,w ^ ^ J 

we can directly consider singular vectors in the product space for u = (v, w), which are character- 
ized by 

XK*K(v + w) = Pl =p 2 , px G dJi{v), p 2 e dJ 2 (w) . (8.25) 

In general we cannot expect that singular vectors of J\ or J 2 are again singular vectors of the 
inf-convolution. The simplest case would be a singular vector v 

XK*Kv =pi G <9Ji(u), i« = 0. 

Then we need that pi G <9J2(0), which is difficult to achieve for general combinations. However, 
the construction works at least for the ground state of one-homogeneous functionals J\ and J 2 . Let 
Vq be the ground state of J\ and Wq be the one of J 2 . Moreover we assume that Ji(v ) < J 2 {wq). 
Then we can estimate 



Ji(v) + J 2 (w) > Mv )\\Kv\\ + J 2 (w )\\Kw\\ > Ji(u )||^(v + w)||. 

Equality is achieved if w — and v — vq, hence the ground state of J\ is also a ground state of 
J. Note that for J 2 (wq) > Ji(vq) we may conclude that wq is not a ground state, potentially not 
even a singular vector. Since such inequalities depend on the scaling of J\ and J 2 this suggests 
that one should use a scaling such that the smallest singular values are equal. 



9 Conclusions and Open Problems 

In this paper we have generalized the notion of singular values and singular vectors to nonlinear 
regularization methods in Banach spaces and demonstrated their usefulness in the analysis. In 
particular we have derived results on the bias of variational methods and scale estimates, which 
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have a particular geometric interpretation in the case of total variation denoising. Moreover, we 
have shown that singular vectors are the solutions that can be reconstructed exactly (up to a 
multiplicative constant) by variational regularization techniques. 

A major open problem is to obtain a constructive approach for computing singular values and 
singular vectors, or at least ground states, of arbitrary problems either analytically or numerically. 
A computational approach for similar problems was already discussed in |40j , as well as for similar 
problems with quadratic constraints in [6, 4SJ [501 IM1 HO] • Our computational experiments indicate 
that such approaches can indeed compute singular vectors, however they do not converge robustly 
to the ground state and it is difficult to control to which singular vector the method will converge. 
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